{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align: center;\">\t\n",
    "\n",
    "# Lecture 3: Conditional Probability and Bayes rule \n",
    "## Instructor： 胡传鹏（博士）[Dr. Hu Chuan-Peng]\n",
    "### 南京师范大学心理学院[School of Psychology, Nanjing Normal University]\n",
    " \n",
    "## Part 1: Conditional Probability (条件概率)\n",
    "\t\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recap of previous lecture\n",
    "\n",
    "### 计算与概率\n",
    "### 证据更新与贝叶斯法则\n",
    "### 贝叶斯与频率学派对比"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "![Image Name](https://cdn.kesci.com/upload/image/rhqd6akbc6.gif?imageView2/0/w/640/h/640)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|                     | 频率学派   | 贝叶斯学派   |\n",
    "| ------------------- | ---------- | ------------ |\n",
    "| 世界真相 (参数) | 固定       | 变化         |\n",
    "| 概率                | 抽样的噪音 | 信念         |\n",
    "| 推断过程            | NHST       | 贝叶斯定理   |\n",
    "| 数据                | 存在噪音   | 固定         |\n",
    "| 推断可更新性        | 否         | 是           |\n",
    "| 主观性              | 前提预设   | 通过先验设定 |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 贝叶斯公式：\n",
    "####  $Posterior = \\frac {probability \\, of \\, data \\, * \\, prior}{Average \\, probability \\, of \\, data}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### $P(A|B) = \\frac{P(A) * P(B|A)} {P(B)}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Core concept today:\n",
    "\n",
    "**$P(A|B)$**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在已知其他事件发生的前提下，我们想知道某个事件发生的概率有多大，这就是所谓的条件概率。\n",
    "\n",
    "关注的焦点是样本空间的一个子集，这类概率被称为条件概率，记作$P(A|B)$或$Pr(A|B)$，读作“已知B时A的概率”。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在生活中，条件概率随处可见：\n",
    "- 对一个65岁且不抽烟的人来说，他得肺癌的概率是多少？\n",
    "- 清晨，发现窗外的马路是湿的，昨晚下过雨的概率是多少（在秋季的南京）\n",
    "- 清晨，发现窗外的马路是湿的，昨晚下过雨的概率是多少（在秋季的新疆）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 计数与条件概率"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "某资深渔友，去一处自己熟悉的河流垂钓，并制定了一个计划：一旦钓到鱼或者等待了4个小时，就停止钓鱼。\n",
    "\n",
    "做如何假定：\n",
    "* 该河钓到鲫鱼的概率是40%；\n",
    "* 钓到鲈鱼的概率是25%；\n",
    "* 钓不到鱼的概率是35%\n",
    "\n",
    "*三个概率的和是1，概率公理之一：整体样本集合中的某个基本事件发生的概率为1*\n",
    "\n",
    "假设，该渔友在4小时内钓到了一条鱼。\n",
    "\n",
    "问题：这条鱼恰好是鲫鱼的概率是多少？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过计数来解决概率问题：\n",
    "\n",
    "假定该渔友不知疲倦，去该河进行了1000垂钓，则\n",
    "* 400次是鲫鱼；\n",
    "* 250次钓到鲈鱼；\n",
    "* 350次没有收获。\n",
    "\n",
    "在上述情景下：钓到一次鱼的总次数是400+250 = 650，其中\n",
    "* 400次是鲫鱼，\n",
    "\n",
    "那么，钓到一条鱼且该鱼为鲫鱼的可能性是：$\\frac{400}{650} \\approx 61.5\\% $。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们将A表示钓到了一条鲫鱼，B表示钓到了鱼，$P(A)$和$P(B)$分别来表示钓到鲫鱼和钓到鱼的概率。\n",
    "\n",
    "那么$P(A)=0.4$，$P(B)=0.4+0.25=0.65$。\n",
    "\n",
    "如果我们钓到了一条鲫鱼，说明我们钓到了鱼，也就是说当事件A发生时，事件B已经发生了，即$A \\in B$.\n",
    "\n",
    "我们将这个概率定义为$P(A \\cap B)= P(A) = 0.4$。\n",
    "\n",
    "$P(A|B) = \\frac {P(A \\cap B)} {P(B)} = \\frac {0.4} {0.65}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们是否可以反过来？\n",
    "\n",
    "$P(B|A) = ?$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 条件概率与疾病诊断"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在我国，每100人有3人患抑郁症。\n",
    "\n",
    "假设小明情绪低落，去医院检查是否患抑郁症，医生告诉小明，抑郁症检测出现假阳性的概率是1%，也就是说每100个健康人中会有一个人的测试为阳性。医生还告诉小明，这个测试假阴性率为0.1%，即每1000抑郁症患者中，只有一人会被检测为阴性。假设小明检测为阳性，那么得抑郁症的概率是多少？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "已知：\n",
    "$$\n",
    "P(阳性｜抑郁) = 1 - P(阴性 |抑郁) = 1 - 0.1\\% = 99.9\\%\n",
    "$$\n",
    "\n",
    "\n",
    "$$\n",
    "P(阳性｜健康) = 1\\%\n",
    "$$\n",
    "\n",
    "$$\n",
    "P(抑郁) = 3\\%\n",
    "$$\n",
    "\n",
    "求：\n",
    "$$\n",
    "P(抑郁|阳性)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以假设现在有100000人，这样设定是来自问题中100人中有3人患有抑郁症，我们可以通过树状图看到更形象的展示：\n",
    "\n",
    "![image4](./figure/Fig3_5.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "所以当小明检测为阳性，又患抑郁症的概率为\n",
    "$$\n",
    "P(抑郁| 阳性) = \\frac {抑郁且阳性的人数}{抑郁且阳性的人数 + 抑郁但健康的人数} = \\frac{2997}{970+2997} \\approx 0.755\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果我们用条件概率的知识，我们已知$P(阳性|健康)$和$P(阴性|抑郁)$，希望求出$P(抑郁|阳性)$，我们可以根据上面的例子得到："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "P(抑郁|阳性)=\\frac{P(抑郁)\\times P(阳性|抑郁)} {P(阳性)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这就是我们说的贝叶斯定理，或者贝叶斯公式"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.13 64-bit (microsoft store)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "7698be9997df07547d4a08fe6d7a8ab77df170c0c470ab0ace6a1c514673cd42"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
